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Abstract 

We  look  at  the  properties  of  Givens’  rotations  computed  and  performed 
in  floating  point  arithmetic  that  conforms  to  the  IEEE  754  standard.  We 
describe  an  algorithm  for  constructing  computationally  OTthogonal  Givens’ 
rotations. 

1  Basic  Concepts 

We  begin  with  the  following  definitions; 

J'p  The  set  of  normalized  p-digit  binary  floating  point  numbers  where 
the  exponent  range  is  assumed  to  be  infinite  (i.e.  no  overflow  or 
underflow  can  occur). 

fl{x)  For  I  €  SR,  this  is  the  closest  element  of  JFp  '•o  r,  with  the  last 
binary  digit  0  in  case  of  a  tie. 

We  assume  throughout  that  the  operations  and  ^  are  cor¬ 

rectly  rounded. 

2  The  Classical  Given’s  Roatation 

Let  X  =  [1112]^  be  an  element  of  3?^.  The  classical  Givens’  rotation  is  an 
orthogonal  matrix 


Gx  = 


0 


(1) 


Givens’  rotations  are  usually  thought  of  geometrically  as  rotation  ma¬ 
trices.  Indeed,  many  derivations  are  given  in  terms  of  the  geometry  of 
the  vector  x  in  0?^  but  the  author  finds  that  these  lead  to  more  confusion 
than  understanding.  Another  way  of  writing  a  Givens’  rotation  is 

Xl  I2 
-X2  Xl 

Clearly  the  structure  of  G  guarantees  that  equation  1  will  be  satisfied 
(at  least  in  exact  arithmetic). 

Givens’  rotations  are  useful  because  they  are  orthogonal,  G^G  =  1. 
In  particular 


cs  —  sc 
2  ,  2 

SC  —  CS  c  +  s 


If  the  Pythagorean  theorem  (c^-f  =  1)  applies  to  the  elements,  then 

we  have  an  orthogonal  matrix  (Note  that  the  structure  of  the  matrix  guar¬ 
antees  that  the  off-diagonal  elements  are  zero,  even  in  floating-point  arith¬ 
metic  since  it  is  commutative).  Indeed,  the  key  to  computing  a  Givens’ 
transformation  is  the  ability  to  normalize  a  vector  in 

The  classical  algorithm  for  constructing  Givens’  rotations  ([1],  p.  45) 
is 


if  X2  =  0 
c  =  1; 
s  =  0; 

elseif  |i2|  >  |xi| 
t  =  11/12; 
s  =  l/sqrt(l  -f  t^); 
c  =  s  *t\ 
else 

t  =  12/xi; 
c  =  l/sqrt(l  +  Py, 
s  =  c  *  t; 
end 


We  propose  that  a  better  organization  of  this  algorithm  which  costs 
no  more  is 

if  12  =  0 
c  =  1; 
s  =  0; 

elseif  |x2|  >  |xi| 
t  =  X1/X2; 

s  =  sqrt(l/(l  -l-t^)); 
c  =  S*t-, 
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else 

t  =  12/xi; 

c  =  sqrt(l/(l  +  1^)); 
s  =  c*  t; 
end 

Experiments  using  extended  precision  arithmetic  show  that  the  second 
algorithm  is,  on  average,  slightly  closer  to  the  correctly  rounded  Givens’ 
rotation  than  the  first  one. 

Usually,  the  acme  of  a  floating-point  computation  is  to  give  a  correctly 
rounded  result.  Before  examining  what  this  means  here,  consider  the 
following  example. 

Example:  The  Givens’  rotation  G  such  that 


’  1 

■  V2  ■ 

1 

0 

G  = 


1 


L  -72 


In  double  precision  using  the  correctly  rounded  value  of  l/\/2  one  finds 
that  the  diagonal  elements  of  G^G  are  roughly  1  +  2.2204i0~^®.  This  is 
only  one  u/pfrom  the  correct  value  but  the  end  result  is  that  the  computed 
Givens’  rotation  is  not  orthogonal,  it  does  not  preserve  norms  but  rather 
increases  them. 

Note  that,  in  this  example,  our  proposed  improvement  yields  a  G  where 
each  of  the  elements  is  the  correctly  rounded  element  from  the  Givens’ 
rotation  while  the  clcissical  one  does  not. 

The  underlying  cause  of  the  problem  we  see  in  the  above  example 
is  not  due  to  rounding  error  in  computing  the  rotation,  rather  it  is  the 
inability  of  the  floating  point  system  to  represent  the  rotation  sufficiently 
well.  In  double  precision  IEEE  754  arithmetic  there  is  no  floating  point 
number  twice  whose  square  is  1. 


3  Computationally  Orthogonal  Givens’ 
Rotations 

We  now  propose  an  cJgorithm  for  computing  a  Givens’  rotation  which  is 
computationally  orthogonal.  That  is,  the  algorithm  computes  a  matrix 
G  such  that  G^G  —  I  to  full  working  precision,  the  computed  matrix  G 
win  be  close  to  the  classiccdly  computed  Givens’  rotation  in  the  Frobenius 
norm. 


if  X2  =  0 
c=  1; 
s  =  0; 
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elseif  |x2|  >  |ii| 

t  =  xi/i2; 

s  =  sqrt(l/(H-<2)); 

fc  = 

c  =  sqrt(l  —  b); 
else 

t  =  X2/xi; 

c=  sqrt(l/(l  +<2)); 

b  =  c^; 

s  =  sqrt(l  —  6); 
end 

We  will  need  to  explore  certain  properties  of  correctly  rounding  floating 
point  arithmetic  to  see  why  this  works. 

The  following  theorem  about  correctly  rounded  square  roots  is  based 
on  a  conjecture  of  Gragg  and  will  be  useful  in  what  follows.  It  is  stated 
and  proved  for  1/4  <  i  <  1  since  we  can  use  exact  argument  reduction 
on  the  exponent  to  put  any  floating  point  number  in  this  range. 

Theorem  1  Let  x  €  Tp  be  a  floating  point  number  such  that  1/4  <  x  <  1, 
and  let  y  =  fl(^/x)  be  the  correctly  rounded  floating  point  representation 
of  \fx.  Then  fl{y^)  is  within  1  ulp  of  x. 

Proof.  It  is  clear  that  1/2  <  y/x  <  1,  and  hence,  since  y  is  correctly 
rounded  it  follows  that  y  =  ‘^/x-l~b  where  K  The  exact  square 

of  y  is 


y^  =  X  +  2b\/x  +  6^ . 

Clearly,  if  and  x  differ  by  less  than  3/2  of  an  ulp  of  x  then  so  will 
fl{y^)  and  I  because  of  correct  rounding.  So  we  need  only  show  that 
\2S^/x  +  5^1  satisfies  this  bound.  Of  course 

\2^^/x  +  5^1  <  |2v^  + 

We  consider  two  cases.  First,  if  1/2  <  i  <  1  then  V®  <  Vl  -  2“?  < 
1  —  and  it  follows  that 


|2v^  +  5|  <  2Vf/  +  \b\ 

<  2  —  2~^  +  <  2. 

And  hence 

|25V^  +  5^|  <  2-P 

which  is  less  than  1  ulp  of  x. 

Second,  if  1/4  <  x  <  1/2  then  y/x  <  yj\l2  -  2-(p+i)  <  l/i/2  - 
2-(p+2)  follows  that 

\2y/i+S\  <  v^<3/2 
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And  hence 


which  is  less  than  3/2  nips  from  x. 

□ 

Theorem  2  Let  x  €  T-p  be  a  floating  point  number  such  thatlf2  <  i  <  1, 
then  the  difference  1  —  x  =  //(I  —  x).  Moreover,  the  last  bit  of  the  binary 
representation  of  f  1(1  —  x)  is  0. 

Proof.  If  X  =  1  or  I  =  1/2  the  result  is  obvious,  so  we  look  at  the  case 
where  1  >  i  >  1/2.  Let  d  =  x  — 1/2.  Then  1  —  x  =  1  —  (l/2  +  (i)  —  1/2  —  d. 
The  binary  expansion  for  d  must  look  like  .0bbb...bb  where  there  are  p  bits 
in  the  expansion,  the  first  is  zero,  and  the  rest  can  be  either  zeros  or  ones, 
but  at  least  one  of  them  is  a  one.  Clearly  now  we  are  computing 


.1000. ..00 
-  .0bbb...bb 

which  is  a  p-bit  number  whose  first  bit  must  be  zero  since  the  result  is 
less  than  1/2.  Hence,  upon  normalization  the  exactly  computed  result  is 
an  element  of  Pp  with  its  last  bit  a  zero.  □ 


4  Why  does  it  work? 

Let  us  consider  the  case  where  the  following  code  fragment  is  executed 

elseif  |x2|  >  |xi| 
t  =  xi/iz; 

s  =  sqrt{l/(l  +  <^)); 

b  =  s^; 

c  =  sqrt(l  —  b); 

In  this  case  0  <  <  <  1  and  thus  1/2  <  1/(1  +<^)  <  1.  After  taking  the 
square  root  we  square  s  and  make  an  assignment  to  force  a  rounding,  b 
should  be  a  number  between  1/2  and  1.  It  is  possible  for  b  to  be  slightly 
less  than  1/2  at  this  point,  we  will  discuss  this  later.  For  now  assume 
b  >  1/2.  We  now  compute  c  =  s/l  —  b,  there  is  no  reason  to  force  an 
intermediate  rounding  for  1—6  with  an  assignment  since  this  is  computed 
exactly  anyway.  Now  note  that  from  theorem  1  we  know  that  =  (1  — 
6)  ±  lulp  where  the  error  is  one  ulp  of  6.  Let  .sss...ss  be  binary  expansion 
of  fl{s^),  and  let  .066. ..660  be  the  binary  expansion  of  1  —  6,  then,  in  the 
worst  case,  the  binary  expansion  of  /l(c^)  must  be  .066. ..660  ±  .000. ..006, 
where  6  can  be  either  0  or  1,  it  will  be  more  usual  that  the  6  will  be  further 
to  the  right.  Then,  in  binary,  +  s^  must  be 
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.SSS...SS 


+  .Qbh...bbO 
±  .000.. .005 

The  sum  is  either  1,  or  it  is  1.000. ..001  which  rounds  to  1  by  the  round 
to  even  rule,  or  .111... Ill  which  also  rounds  to  1  by  the  round  to  even 
rule.  If  the  5  is  further  to  the  right  then  simple  rounding  will  give  us  a  1. 

There  is  a  problem  if  b  is  less  than  1/2.  This  happens  in  IEEE  754 
single  precision  CcJculations  since  the  square  of  the  square  root  of  1/2  here 
is  less  than  1/2.  This  only  occurs  if  1/(1  +t^)  is  sufficiently  close  to  1/2 
so  one  can  avoid  it  with  a  simple  trap.  This  is  not  a  problem  in  IEEE  754 
double  precision  calculations  and  the  code  wiU  work  as  written. 

One  can  make  a  variation  on  this  algorithm  which  preserves  the  com¬ 
putational  orthogonality  but  reduces  the  Frobenius  norm  of  the  difference 
between  the  computationally  orthogonal  Givens’  rotation  and  the  classi¬ 
cally  computed  one.  To  do  this  first  compute  the  classical  Givens’  rota¬ 
tion.  Then  check  to  see  if  it  is  computationally  orthogonal.  If  it  is  then 
stop.  Otherwise  compute  the  computationally  orthogonal  rotation.  Then 
use  bisection  on  the  smaller  of  c  or  s  to  find  the  computationally  orthog¬ 
onal  rotation  which  is  closest  to  the  classical  one.  It  is  not  clear  that 
one  would  want  to  do  this.  The  algorithm  we  have  given  tries  to  find  a 
computationally  orthogonal  rotation  that  is  as  close  to  a  truly  orthogonal 
rotation  as  possible.  Using  a  bisection  as  just  described  would  take  you 
away  from  this  point. 

5  Another  Algorithm  for  Rotations 

The  following  is  yet  another  variation  on  the  theme  of  orthogonal  rota¬ 
tions.  It  uses  techniques  from  simulated  extended  precision  (SEP)  arith¬ 
metic  to  do  its  work  and  is  written  for  IEEE  754  double  precision  arith¬ 
metic  (although  it  can  be  modified  to  work  in  single  precision).  Experi¬ 
ments  with  random  vectors  from  with  uniformly  distributed  elements 
show  that  it  is  computationally  orthogoncd  roughly  95%  of  the  time.  These 
experiments  also  show  that  when  the  rotation  is  not  exactly  orthogonal 
then  the  diagonal  of  G^G  contains  the  number  directly  to  the  left  of  1  in 
the  floating  point  system. 

if  i2  =  0 
c  =  1; 
s  =  0; 

elseif  I12I  >  |ii| 
t  =  X1/12; 

s  =  sqrt(l/(l  -I-  f})-, 
bb  =  134217728  +  s; 
b  =  {s-bb)+  bb; 


6 


bb  =  s  —  b; 
imp  =  b^ ; 
dd  =  2  *  b  *  bb; 

d  =  trap  +  dd; 

dd  =  ((trap  —  d)  +  dd)  +  bb^- 
6  =  1  —  d; 

6  =  6  —  dd' 
c  =  sign{t)  *  sqrt(6); 
else 

t  =  i2/ii; 

c  =  sqrt(l/(l  +  i^)); 

66  =  134217728  *  c; 

6  =  (c  —  66)  +  66; 
bb  =  c  —  b', 
trap  =  6^; 
dd  —  2  *  b  *  bb; 
d  =  trap  +  dd; 

dd  =  {(trap  —  d)  +  dd)  +  66^; 
b  =  1  —  d; 

6  =  6  —  dd; 
s  =  sign{t)  *  sqrt(6); 
end 
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